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EXPERIMENTAL  ERRORS  IN 
DERIVED  THERMODYNAMIC  CONSTANTS 

C.  R.  Frink  and  P.  E.  Waggoner 


Experimental  determination  of  thermodynamic  equilibrium  constants 
is  a  popular  and  powerful  technique  of  soil  chemistry.  The  applicability 
of  a  particular  theory  is  generally  established  by  demonstrating  that  the 
constant  in  the  mathematical  statement  of  the  theory  is  invariant  over  a 
wide  range  of  conditions.  Even  when  the  theory  applies,  however,  the 
estimates  of  the  constant  will  vary  because  the  observations  from  which 
they  are  derived  are  inexact.  Thus,  a  statistical  test  is  required  to  deter- 
mine whether  the  variability  in  the  estimates  of  the  derived  constant 
exceeds  the  error  inherent  in  their  measurement. 

The  usual  analysis  of  variance  (12)  would  be  a  satisfactory  test, 
provided  that  the  necessary  replicate  chemical  determinations  were  made. 
Unfortunately,  chemists  rarely  provide  such  data,  owing  either  to  an  aver- 
sion towards  extra  work  or  towards  statistics.  However,  many  procedures 
have  been  in  use  so  long  that  an  estimate  of  their  precision  can  be  taken 
as  the  precision  of  the  entire  population  of  determinations  by  these 
methods  (12).  Thus,  we  feel  justified  in  estimating  the  precision  of  routine 
chemical  analyses  from  experience.  Now  we  inquire  how  we  may  use 
these  estimates  to  predict  the  variability  expected  in  a  derived  constant 
from  a  knowledge  of  the  variability  of  its  several  constituent  measurements. 

Kolthoff  and  Sandell  (9)  have  summarized  earlier  work  (1)  and 
discuss  the  errors  expected  in  derived  results.  If  the  result,  e.g.  a  constant, 
is  calculated  as  a  sum  or  difference  of  its  constituent  measurements,  its 
variance  is  the  sum  of  the  variances  of  the  individual  measurements. 
If  the  result  is  a  product  or  quotient,  its  squared  coefficient  of  variation 
is  the  sum  of  the  squared  coefficients  of  variation  of  the  individual  measure- 
ments. Similar  procedures  are  used  in  separating  sampling  from  analytical 
errors  (12).  While  these  methods  seem  satisfactory  for  many  purposes, 
they  do  not  indicate  clearly  how  sums  containing  various  coefficients  are 
treated,  nor  do  they  describe  products  with  exponential  terms.  A  more 
serious  defect  is  the  neglect  of  cases  where  two  or  more  variables  are 
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correlated.  In  this  case,  a  correction  term  containing  the  correlation  coef- 
ficient must  be  introduced  (11). 

Recently,  Ku  (10)  has  treated  the  problem  of  propagation  of  errors 
in  a  more  systematic  fashion.  He  utilized  a  theorem  which  relates  the 
variance  of  a  function  f(x,y)  to  the  first  and  second  partial  derivatives 
with  respect  to  the  arguments  x  and  y,  as  well  as  to  the  variance  and 
covariance  of  x  and  y.  For  simple  functions  where  the  partial  derivatives 
may  be  written  explicitly,  his  approach  is  more  elegant;  however,  this 
is  not  always  the  case  in  the  functions  we  shall  encounter.  In  some  in- 
stances, the  partial  derivatives  could  be  evaluated  graphically.  In  general, 
however,  the  approach  we  propose  below  seems  more  suitable  for  the 
functions  likely  to  be  encountered  in  chemical  equilibria. 

Thus,  we  shall  apply  these  concepts  to  an  evaluation  of  the  errors 
to  be  expected  in  derived  thermodynamic  constants  and  then  compare 
our  predicted  errors  with  those  observed  in  experimental  data.  Different 
ways  of  reading  this  bulletin  are  suggested  for  different  purposes.  Logically, 
THEORY  and  COMPUTATIONAL  AIDS  precede  APPLICATIONS, 
and  this  is  the  order  that  follows.  Many  may  wish,  however,  first  to  see 
the  usefulness  of  the  methods  in  testing  the  constancy  of  the  equilibrium 
constant  in  a  specific  chemical  reaction.  They  should  go  to  APPLICA- 
TIONS first  and  return  to  THEORY  and  AIDS  as  needed. 


Theory 

Inasmuch  as  indicated  products  or  quotients  in  equilibrium  constants 
can  easily  be  expressed  as  linear  functions  of  logarithmic  terms,  we  need 
only  develop  an  equation  for  the  variance  of  a  sum  of  variates.  Following 
Weatherburn  (11),  we  let  u;  be  a  linear  function  of  the  variates  xh  Vj, 
Zj  .  .  .  ,  with  known  constants  a,  b,  c  .  .  .  either  positive  or  negative: 

u;  =  ax,  +  by;  +  cz;  +  .  ..  [1] 

The  expectation  or  limiting  mean  of  us  is  related  to  the  other  variates 
thus: 

E(u)  =  aE(x)  +  bE(y)  +  cE(z)  +  .  .  .  [2] 

From  [1]   and  [2]  we  obtain  by  subtraction: 

Su,  =  aSXi  +  bSys  +  c3z,  +  .  .  .  [3] 

where  8  indicates  the  deviations  of  the  variates  about  their  limiting  means. 
If  we  recall  that  the  variance  of  any  of  the  variates,  say  x,  is  by  definition: 

(t2  =  E(Sx)2  [4  J 
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and  that  the  correlation  coefficient  between  any  two  variates,  say  x  and 
y,  is: 

P     =— E(8x)(8y) 

a^y  [5] 

we  may  find  the  desired  formula  from  [3]  by  squaring  both  sides  and 
taking  expectations.  The  result  is: 

a2  =  aV2  4-  b2a2  4-  cV2  4-  2abp    a  a   4-  2acp   a  a-  +2bCp   a  a 

u  x  y  z  'xy     xy  'xzxz  'yzyz      r£. 

Since  Uj  in  [1]  corresponds  to  -log  K  or  pK  in  the  usual  logarithmic 
expression  of  equilibrium  constants,  the  hypothesis  to  be  tested  is  that  Uj 
is  constant.  In  other  words  we  shall  test  whether  the  observed  variance  of 
Uj  given  by: 

exceeds  the  predicted  variance  <j2  given  by  [6].  Since  we  are  comparing 
an  observed  variance  with  a  theoretical  one,  a  chi-square  test  is  appropriate 
for  testing  whether  the  quantity  calculated  by  equation  [7 J  is  statistically 
greater  than  that  given  by   [6]. 

At  this  point,  we  must  recognize  several  complications  in  using  equa- 
tion [6].  First,  the  variates  xj5  yi5  zs  .  .  .  are  usually  not  measured  directly, 
but  will  be  calculated  from  analytical  determinations  of  the  experimental 
variables  RIJ(.Sj,  T,  .  . .  .  Further,  the  calculated  variates  may  be  non-linear 
functions  of  more  than  one  experimental  variable,  so  we  define: 

x.WfCRi.S^TJ  [8] 

y,  =  gCRlf  Si5  Tt)  [9] 

z1=h(RlfSwT1)  [10] 

Thus,  an  experiment  consists  of  measuring  R,,  St,  Ts  ...  during  some 
systematic  manipulation  of  the  experimental  conditions,  and  then  calculat- 
ing U;  from  equations  [1]  and  [8]  to  [10].  More  specifically,  us  would  be 
a  pKj,  while  Xj  might  be  the  concentration  of  A1+3  ions  calculated  from 
Ri5  a  pH  measurement,  and  Si5  a  measurement  of  total  aluminum  (6,  7). 
Since  we  want  to  calculate  the  predicted  variance  from  [6],  we  must 
be  prepared  to  cope  with  the  non-linearity  of  [8]  to  [10].  The  con- 
sequence of  this  non-linearity,  of  course,  is  that  the  mean  of  xi5  for 
example,  is  not  equivalent  to  f  (R,  S,  T).  Furthermore,  even  though  the 
experimental  errors,  which  we  will  define  as  SRS,  8S{,  STt  .  .  .,  may  be 
independent  with  zero  expected  means,  the  errors  8x;,  Syj5  8z;  ...  are 
neither  independent  nor  have  zero  expectations. 
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To  examine  the  magnitude  of  the  errors  introduced  by  these  assump- 
tions, we  may  express  [8]  as: 

x,  =  f(R  +  SR;,  S  +  SSi5  f  +  8T;)  [11] 

Here  we  have  chosen  to  use  sample  estimates,  rather  than  population 
parameters,  i.e.  R  rather  than  E(R),  since  we  have  defined  Ri?  S15  Ts  .  .  . 
as  the  experimental  observations.  Expansion  of  the  right-hand  member  of 
[  1 1 J  by  means  of  Taylor's  series  (cf .  (2)  for  a  similar  treatment  of  sampling 
errors)  leads  to: 

Xi  =  f(R,  S,  f )  +  SR^xJ  +  iSR?f£(Xi)  +  .  .  . 
+  8SIf'a(x1)  +  *8S;f£(x1)+... 
H-aT^Cx/)  -fiS'Pf-(x1)  +  ...  [12 j 

We  may  obtain  the  expectation  of  [12]  by  summing  over  all  values  of  i 
and  dividing  by  N.  It  is  evident  that  if  the  function  is  linear,  the  expecta- 
tion of  X;  is  indeed  f(R,  S,  T).  Furthermore,  if  the  errors  of  observation 
SRj,  SSj,  STi  .  .  .  are  small,  or  the  departure  from  linearity  is  not  great, 
we  may  approximate  the  expectation  of  xi  as  f(R,  S,  T).  Similar  considera- 
tions hold  for  [9]  and  [10].  In  practice,  we  shall  usually  find  that  both  of 
these  criteria  are  met;  if  not,  a  graphical  evaluation  of  [12]  is  probably 
the  easiest  approach. 

The  second  complication  in  using  equation  [6]  concerns  the  nature 
of  the  population  from  which  our  observations  of  Ui,  u2  .  .  .  un  are  drawn. 
Equation  [6],  as  derived,  is  applicable  to  repeated  observations  of 
Ri,  Sp  T;  .  .  .  and  the  subsequent  calculation  of  u;  from  xis  y;,  zl  .  .  .  for 
samples  drawn  from  the  same  population.  We  wish  to  enquire,  however, 
whether  ui  is  constant  when  the  experimental  conditions  R;,  Si?  Ts  ... 
are  varied  in  a  deliberate  fashion  as  by  dilution  or  acidification.  Thus,  we 
have  created  several  populations  of  samples,  and  consequently  the  means 
R,  S,  T  .  .  .  have  little  meaning  since  the  bulk  of  the  variation  is  non- 
random.  In  addition,  since  these  samples  are  drawn  from  different  pop- 
ulations, we  cannot  assume  homogeneity  of  variance;  in  fact,  we  find 
from  numerical  calculations  that  the  variance  of  xp  yi;  zi  ...  depends 
on  the  values  of  R;,  Si5  T;  .  .  .  . 

Obviously,  we  need  to  compare  values  of  ui  from  different  popula- 
tions in  order  to  provide  a  discriminating  test  of  the  constancy  of  pK 
over  varying  conditions.  Thus,  we  must  cope  with  the  heterogeneous 
variance  introduced.  One  solution  is  to  calculate  from  [6j  the  estimated 
a2n  for  each  Ui,  u2  .  .  .  un  determined.  Then,  these  estimates  of  the  variance 
may  be  weighted  in  the  computation  of  chi-square  as  follows: 


x 


-  l)s*     v[(ui-u)2l 
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where  it  is  understood  that  the  variance  <r\  is  taken  at  each  value  of  i. 
We  assume  that  this  statistic  is  distributed  approximately  as  chi-square 
with  n  —  1  degrees  of  freedom. 

Finally,  we  must  recognize  one  further  complication  in  the  use  of 
equation  [6J,  which  requires  that  we  distinguish  carefully  between  errors 
of  measurement  which  are  correlated  and  those  which  are  not.  Normally, 
in  chemical  equilibria,  a  change  in  the  concentration  of  one  chemical 
species  produces  changes  in  the  concentration  of  all  other  species;  con- 
sequently, all  the  variates  are  correlated.  Clearly,  this  is  the  case  when 
replicate  samples  are  prepared:  an  error  in  one  variate  will  produce  cor- 
responding errors  in  the  other  variates.  As  a  simple  example,  consider  the 
preparation  of  three  replicate  solutions  of  KC1,  either  by  weighing  three 
portions  of  salt  or  by  dilution  of  three  aliquots  of  a  standardized  stock 
solution.  In  either  case,  the  analyst  is  not  likely  to  measure  the  resulting 
concentrations,  but  will  calculate  them  from  the  known  weight  or  dilution 
factors.  Since  errors  in  weighing  or  dilution  are  inevitable,  he  has  thus 
introduced  an  error  which  we  will  call  the  sample  error.  If  we  now  require 
the  concentration  of  K+  plus  CI"  ions  in  the  solution,  the  variance  of  the 
sum  will  be  given  by  equation  [6].  Obviously,  in  this  case,  since  the  cor- 
relation coefficient  between  the  calculated  concentrations  of  the  two  ions 
is  positive  and  equal  to  unity,  and  their  individual  variances  are  the  same, 
the  variance  of  the  sum  is  four  times  the  variance  of  the  original  error 
made  in  preparing  the  sample.  At  this  point,  it  is  well  to  note  that  this  use 
of  the  term  sampling  error  is  quite  specific  and  does  not  coincide  with  the 
usual  usage:  we  are  not  concerned  here  with  the  ability  of  the  analyst 
to  obtain  replicate  sub-samples  of  a  bulk  shipment  of  muriate  of  potash 
in  order  to  determine  its  KC1  content. 

Continuing  with  this  example,  suppose  now  that  we  wish  to  measure 
the  K+  and  CI"  concentrations  in  three  replicate  solutions.  Since  we 
cannot  measure  these  variates  precisely,  we  introduce  additional  uncer- 
tainties. However,  unless  one  variate  is  calculated  from  measurements  of 
another,  as  implied  in  [8]  to  [10],  the  uncertainties  of  observation  are 
not  correlated.  For  example,  we  might  determine  K+  by  flame  photometry 
and  CI"  by  titration  with  AgN03.  Then  the  variance  of  the  sum  of  the 
two  observations  would  be  merely  the  sum  of  the  variances  of  the  two 
independent  analytical  methods.  Notice,  however,  that  if  we  determined 
only  the  CI"  content,  and  calculated  the  K  content  by  equating  the  two, 
the  errors  of  observation  would  be  correlated  and  the  variance  would 
increase  accordingly. 

Thus  our  definition  of  the  deviations  SRi5  SSj,  8Ti  .  .  .  and  the  cor- 
responding SXj,  8y;,  Sz;  .  .  .  which  specifies  that  they  are  measured  about  the 
observed  means  includes  both  of  these  error   terms.   We  may   redefine 
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x.  =  E(x)  +  Sxoi  +  8xsi  or  Rj  =  R  +  SRol  +  8Rsi  where  s  indicates  the 
sample  error  and  o  the  observational  error.  If  we  assume  that  there  is  no 
correlation  between  these  two  sources  of  error,  the  expected  variance  can 
be  derived  in  the  same  fashion  as  equation  [6].  The  result  for  two  variates 
is: 

o-2    -f  a2   =  aV   +  aV2   +  b2a2   +  b2a2 

su  ou  sx     '  ox     ■  sy      '  oy 

+  2abP  a   a    +  2abP  a  a  1141 

s    sx    sy  '  o    ox    oy  L 

This  equation  is  of  course  identical  to  [6]  and  merely  indicates  that  the  two 
variances  are  additive.  Note  that  the  additivity  of  the  two  components  of 
variance  is  only  true  for  single  observations  on  single  samples.  Since 
replicate  observations  will  reduce  o-2  but  not  o-2,  while  replicate  samples 
will  reduce  the  variance  from  both  components  (3),  the  variance  of  the 
mean  of  n  observations  of  k  samples  is  (o-2/n  -f  o-2)/k.  It  appears  simpler 
to  retain  our  original  definitions  of  SR,,  SSi5  ST;  .  .  .  to  include  all  devia- 
tions about  the  means  and  simply  choose  the  proper  correlation  coef- 
ficients. Thus,  we  need  to  distinguish  carefully  between  the  variance 
expected  for  replicate  observations  of  a  single  sample,  or  single  observa- 
tions of  replicate  samples.  The  variance  expected  for  replicate  observations 
may  be  estimated  from  our  chemical  experience  (12).  We  may  also 
estimate  the  errors  in  preparation  of  replicate  samples  from  our  knowledge 
of  the  errors  of  common  laboratory  operations.  If  other  types  of  replica- 
tion are  involved,  as  in  the  determination  of  solubility  products  in  replicate 
soil  extracts,  the  greater  variability  inherent  in  soil  samples  could  be 
estimated. 

In  summary,  we  have  derived  the  necessary  equation  for  the  variance 
of  a  sum  of  variates  and  have  explored  the  complications  in  its  use: 
first,  the  calculated  variates  must  be  nearly  linear  functions  of  the  experi- 
mental variables  which  in  turn  are  measured  with  small  error;  second, 
the  non-homogeneous  variance  from  one  population  to  another  must  be 
compensated  for  by  the  proposed  weighted  chi-square;  and  third,  sampling 
and  observational  errors  must  be  clearly  distinguished. 

Computational  Aids 

Before  turning  to  the  experimental  examples,  we  must  consider  sev- 
eral computational  aids.  Data  frequently  are  collected  in  both  arithmetic 
and  logarithmic  units  and  confusion  often  arises  over  their  inter-conver- 
sion. Furthermore,  since  many  analytical  errors  are  proportional  to  the 
amount  taken  for  analysis,  statistical  practice  (3,  12)  calls  for  a  transfor- 
mation to  logarithmic  units  to  provide  homogeneity  of  variance.  We  shall 
show  that  for  all  practical  purposes,  when  the  errors  of  observation  are 
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small,  the  required  calculations  of  means  and  standard  deviations  may  be 
made  before  or  after  inter-conversion  from  arithmetic  to  logarithmic  units. 
Let  Wj  be  a  measurement  that  has  not  been  converted  to  logarithms 
and  &i  be  its  relative  error,  1— Wj/w.  Then  we  require  the  relationship 
between  log  w  and  log  w,  and  between  s2log  w  and  s2  Again,  we  prefer  the 
notation  for  sample  estimates,  rather  than  population  parameters  E(log 
w)  and  log  E(w).  Use  will  be  made  of  the  series: 

log(k  +  e)^Mln(k  +  e)=M{lnk  + 2  [^  +  j(^)3 +  •••]} 


15 


Since: 


, SlogWi       2log(W;/w)(w)  _        1  Wj 

lOgW^-^—  = ^logw-h^log-^ 

[16] 

we  see  that  the  last  term  in  the  right-hand  member  of  [16]  is  the  dif- 
ference between  the  mean  of  the  logarithms  and  the  logarithm  of  the 
means.  The  error  introduced  by  this  term  is  evaluated  by  equation  [15]. 
Since  Wj  =  w  -f-  e^w  : 

w,  w  4-  e;w 

log  _L  =  log =_  =  log(l  +  e.)  [17] 

w  w 


If  we  expand  [17]  with  [15]  and  realize  k  =  1,  we  see: 
5log(l  +  e,)       2M 


n    M2  +  e;  ^3\2  +  eJ    ^        J 


[is; 


In  laboratory  analyses  of  the  sort  considered  here,  a  relative  error  e;  of 
0.1  would  be  extremely  large.  Therefore,  the  goodness  of  approximation 
is  conservatively  evaluated  by  assuming  half  the  ei  are— 0.1  and  half 
are  0.1.  When  e;  is  0.1,  the  cubic  term  in  [18]  is  negligible  and  can  be 
omitted,  and: 

. ,      _      2Mfn/-0.1\       n/0.1\l 

log  w  -  log  w  =__[T(__)  +T(tr) J  =  -0.002      [19] 

Thus,  we  conclude  that  the  mean  of  the  logarithms  is  for  all  practical  pur- 
poses equal  to  the  logarithm  of  the  means. 

Now  we  consider  the  relationship  between  s?lo  w  and  s2e. 

Employing  [17]: 

(n-l)s^  w  =  S(log  Wj-log  w)2  =  2[log(l-ei)]2  = 

M22[ln(l  +  e^]2  [20] 
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Expanding  [20]  by  means  of  [15]: 

^.-=^s[(I^)-+K^y+i(I^)-+..] 

[21] 

Again,  if  the  extreme  case  that  half  the  e;  are  —0.1  and  half  0.1  is  as- 
sumed, only  the  first  term  within  the  brackets  of  [21]  need  be  retained. 
For  this  case,  we  shall  demonstrate  that  s2o  wis  approximately  M2s2  by 
dividing 

(n-l)M2s2  =  M2^— e)2  =  M22e2  [22] 

into  [21].  The  quotient  is: 

log  w/ 


s; 

log  w. 


Vs;  =  2^[^(=§)2  +  j  (|yJ  ]/W  =  1.008 


[23] 

Thus,  for  relative  errors  of  0. 1  or  less  and  for  all  practical  purposes,  s2 
equals  M2s2.  In  addition,  it  is  readily  shown  that  s2  is  the  square  of  the 
coefficient  of  variation,  namely  s2  =  s^/w2.  Thus,  we  can  convert  between 
linear  and  logarithmic  data  as  necessary. 

For  computational  purposes,  it  is  frequently  convenient  to  have  an 
expression  for  the  variance  of  a  product  or  ratio  without  the  necessity  for 
conversion   to    logarithms.   Using   our   previous    approximation    s2       = 

M2s2 /w2,  we  may  derive  the  variance  of  u  —  xy.  Since  the  variance  of 
log  u  is: 

s2       =  s2       +  s2       4-  2rs        s  [24 1 

log  u  log  x      '         log  y      '  log  x      log  y  l  ' 

we  readily  obtain: 

s2        s2        s2         2rs  s 

^_  „_, x_         y_  _  x  j 

u2~  x2  "•"  y2  "^      xy  [25] 

Since,  according  to  [  1 2  ] ,  u  is  approximately  equal  to  xy  for  small  variations 

in  x  and  y: 

s2  =  (y)2s2  +  (x)2s2  +  2r  xy  sxSy  [26] 

By  a  similar  process,  we.  obtain  the  variance  of  u  =  x/y: 

(x)2  r  s2        s2        2rs  si 


(y)2  |_  x2        y2  xy  J  [27] 

Thus,  within  the  limits  of  the  indicated  approximations,  and  if  no  correla- 
tion exists,  [25]   and  [27  J  reduce  to  the  sum  of  the  squared  coefficients 
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of  variation  as  previously  noted.  The  same  expressions  for  the  variance 
of  a  sum  or  product  were  obtained  by  Ku  (10)  using  propagation  of  error 
formulas.  He  also  presents  a  more  detailed  discussion  of  the  necessary 
assumptions  and  approximations.  With  these  computational  aids,  we  now 
consider  some  numerical  examples. 


Applications 

Example  One 

Predicted  variance.  First,  we  examine  the  errors  inherent  in  the  deter- 
mination of  the  first  stage  hydrolysis  constant  of  aluminum  (7).  The 
chemical  reaction  involved  is: 

A1  +  3 +  H20  =  A10H  +  2  +  H+  [28] 

Thus,  the  negative  logarithm  of  the  equilibrium  constant,  pKi,  is  defined 
according  to  the  theory  under  test: 

pK1  =  pA10H  +  2  +  pH+  -pAl  +  3  [29] 

where  the  symbol  p  denotes  the  negative  logarithm  of  the  individual  ion 
activities  (the  activity  of  water  is  assumed  to  be  unity).  Since  most 
chemical  methods  measure  concentrations,  not  activities,  we  immediately 
find  ourselves  faced  with  the  situation  anticipated  in  equations  [  8  ]  to  [10]. 
First,  we  recall  that  ion  activities  are  defined  by: 

-pH  =  log  (H)  =  log  yH+  log  [H]  [30] 

where  parenthesis  indicate  activities,  brackets  indicate  concentrations,  and 
y  is  the  appropriate  activity  coefficient.  Although  activity  coefficients  are 
a  complex  function  of  the  concentration  of  all  ions  in  solution  (i.e.  the 
ionic  strength),  at  any  given  concentration  they  are  constant,  and  more- 
over, we  shall  assume  they  are  known  without  error.  This  latter  assump- 
tion can  never  be  proved  wrong,  since  single  ion  activity  coefficients  can 
never  be  measured,  but  are  calculated  from  one  of  a  number  of  theories. 
In  any  event,  different  theories  are  in  reasonably  good  agreement  for 
dilute  solutions,  so  we  shall  proceed. 

With  these  definitions  in  mind,  we  can  rewrite  [29]  as: 

pKx  =  -log[A10H]  -  logLH]  +  log[Al]  -  log^^        [31] 

where  the  chemical  valences  have  been  omitted  for  simplicity.  For  the 
calculation  of  pKi,  the  analytical  determination  of  pH  (note  that  the  glass 
electrode  determines  activity)  is  required.  The  total  amount  of  aluminum 
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present,  [Alt],  could  be  determined  analytically,  but  in  this  case  (7)  it  was 
calculated  from  the  serial  dilution  of  a  carefully  standardized  stock 
solution.  Since  one  [H+]  is  produced  for  each  [A10H  +  2],  we  equate 
their  concentrations.  The  concentration  of  [A1  +  3]  is  then  equal  to  [Alt  — 
H+].  Thus  we  may  write  [31]  as: 


pKi  =  -2  log[H]  +  log[Alt  -  H]  -  log  A1°H  h 


32 


PK 


Now  we  may  use  equation  [6]  to  calculate  the  variance  expected, 
,  if  replicate  determinations  were  made  on  solution  No.  1  shown  in 
Table  1  (7).  First,  we  must  evaluate  the  expected  observational  variance 
of  the  term  log[Alt  —  HJ.  While  this  term  could  be  expanded  with  Taylor's 
series  as  in  [12],  it  is  simpler  to  use  [6]  for  the  variance  of  [Alt  — H], 
and  then  convert  to  logarithms  using  the  approximation  cr[o  w  =  Mae . 
A  graphical  evaluation  of  the  data  (Table  1)  indicates  that  [Alt — H|  is  a 
nearly  linear  function  of  [AltJ  over  the  whole  range  of  concentrations 
studied,  and  is  nearly  linear  in  [H]  for  small  deviations  about  a  particular 
observation.  Thus,  little  error  is  introduced  by  our  implicit  assumption 
that  the  expectation  or  mean  of  [Al  +  3]  is  the  expectation  of  [Alt  — H| 
either  for  replicate  observations  or  for  replicate  samples. 

Since  the  concentration  of  aluminum  was  not  determined  analy- 
tically but  was  calculated  from  dilution,  the  errors  of  observation  of  [Alt] 
are  obtained  from  an  estimate  of  dilution  errors.  If  we  assume  the  errors 
(<re)  of  dilution  to  be  one  per  cent,  the  variance  of  [Alt]  in  solution  No.  1 
is  (0.01  X  1.00  X  lO'2)2  or  (1  X  104)2.  We  estimate  the  standard  devia- 
tion («r  )  of  a  pH  measurement  or  of  — log[H]  to  be  0.02  pH  units 
(7)  or  Z ''  =  4.61  per  cent.  Since  — log[HJ  is  3.63  or  [H]  =  2.34  X  104, 
the  variance  of  [H]  is  (0.0461  X  2.34  X  104)2  or  (1.08  X  105)2.  In  addi- 
tion, we  require  the  correlation  coefficient  between  [Alt]  and  [Hj  for 
replicate  observations  on  a  single  sample.  Although  [ AltJ  and  [HJ  appear 
(Table  1)  to  be  positively  correlated,  the  correlation  between  the  errors  of 
observation  S[Altj  and  3[HJ  is  zero  as  previously  discussed.  Thus,  the 
variance  of  [Alt  — H]  is  merely  the  sum  or  (1.00  X  10  4)2.  Since  [Alt  — 
H]  =  97.7  X  10  4,  the  standard  error  (<re)  is  1.03  per  cent  or  the  expected 
observational  variance,  a2,  of  log|Al  — H|   is  (0.0045)2. 

Having  determined  the  observational  variance,  we  now  inquire  what 
the  expected  variance  of  log[Alt  — H]  would  be  for  the  preparation  of 
replicate  samples.  Again,  we  assume  the  error  of  dilution  in  preparing 
[Alt]  is  one  per  cent.  From  a  graphical  evaluation  of  the  data  in  Table  1, 
we  find  that  this  will  produce  a  0.60  per  cent  error  in  [H|.  From  the 
chemistry  of  equation  [28  |  or  from  the  experimental  data  it  is  evident 
that  an  increase  in   |Alt|   is  accompanied  by  an  increase  in   [HJ.  Thus, 
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the  correlation  is  positive,  and,  for  small  deviations  about  the  mean  is 
nearly  perfect;  accordingly  we  assume  p  =  1.0.  Then  the  sampling  variance 
predicted  for  [Alt  — H]  is: 

a2s  =  (l)2  (0.01  X  1-00  X  10"2)2  +  (l)2  (0.006  X  2.34  X  10"4)2  + 

(2)  (1)  (-1)  (1.0)  (0.01  X  1.00  X  10"2)  (0.006  X  2.34  X  10"4)  = 

(9.86  X  10'5)2  [33] 

or  the  variance  of  log[Alt  — H]   is  (0.0044)2.  Owing  to  the  correlation 

between  the  variates,  the  effect  is  to  reduce  the  predicted  variance.  Indeed, 

we  should  anticipate  this  from  [6],  which  for  only  two  variates  would  be: 

<  =  *<  +  b2oJ  +  2abPorx  a  [34] 

Clearly,  if  a2  equals  b2,  and  the  errors  of  x  and  y  are  equal,  the  predicted 
variance  is  zero  if  the  errors  are  perfectly  correlated  and  2abp  has  a  nega- 
tive sign.  The  negative  sign,  of  course,  arises  in  one  of  two  ways:  either 
the  sum  of  two  negatively  correlated  variates  or  the  difference  between 
two  positively  correlated  variates  is  calculated.  As  previously  indicated, 
this  situation  is  usually  encountered  in  the  determination  of  thermodymanic 
equilibrium  constants,  and  decreases  the  expected  variance  from  one 
sample  to  another. 

Since  we  require  the  expected  variance  for  analyses  of  replicate  sam- 
ples, the  variance  of  log[Alt  — HJ  is  clearly  the  sum  of  the  observational 
and  sampling  errors,  or  (0.0063)2.  However,  the  present  experiment  was 
based  on  single  analyses  of  three  replicate  solutions  for  each  pK;  deter- 
mined, with  the  mean  pK  values  reported.  This  condition,  or  the  reverse, 
namely  replicate  analyses  of  single  samples  with  only  the  means  reported, 
is  fairly  common  in  chemical  data.  Thus  we  inquire  how  the  sampling  and 
observational  errors  are  to  be  combined.  Since  replicate  observations 
will  reduce  o-2  but  not  a2,  while  replicate  samples  will  reduce  the  variance 
from  both  components,  the  variance  of  the  mean  of  n  observations  of  k 
samples  is  (o-?/n  +  <^)/k  as  previously  stated.  Thus,  the  variance  of 
log [ Alt  — H]  for  three  replicate  samples  is  [(0.0045)2  +  (0.0044)2]/3  or 
(0.0036)2.  Before  continuing  with  the  calculation  of  the  variance  of  pKj 
we  will  calculate  the  observational  and  sampling  variance  expected  for  the 
remainder  of  the  experimental  observations  (Table  1).  Since  we  will  later 
require  the  individual  estimates  of  each  component  of  the  variance,  they 
have  not  been  summed  in  the  tabulated  data.  Initially,  the  variance  of 
log[Alt  —  HJ  is  dominated  by  the  variance  of  [Alt],  since  [Alt]>>[H]. 
Then,  as  [HJ  approaches  |Alt],  the  variance  increases  considerably, 
reflecting  the  much  greater  uncertainty  in  the  measurement  of  [HJ. 

Now  we  have  estimates  of  the  variance  of  all  three  terms  in  [32], 
having  assumed  that  the  variance  of  the  activity  coefficient  term  is  zero. 
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Note  that  if  these  uncertainties  were  known,  they  could  be  included  at  this 
point.  In  order  to  determine  the  predicted  variance  of  pK:  for  analyses  of 
replicate  samples,  we  need  to  consider  again  the  difference  between  obser- 
vational and  sample  preparation  errors.  In  the  case  of  errors  of  observa- 
tion, it  is  clear  that  log[H]  and  log[Alt  — H]  are  correlated,  since 
[Alt  — H]  is  calculated  from  the  observed  [HJ.  Further,  an  increase  in 
[H]  causes  a  decrease  in  [Alt  —  H],  so  the  correlation  is  negative  and  we 
estimate  p  =  — 1.0.  Using  our  estimate  of  the  observational  variance  of 
log [Alt  —  H]  =  (0.004)2,  we  find  the  observational  variance  of  pKi  for 
sample  No.   1  is: 

oj  =  (-2)2  (0.02)2  +  (l)2  (0.004)2  +  (2)  (-2)  (1)  (-1.0)  (0.02)  (0.004) 
°  =  (0.044)2  [35  j 

We  now  inquire  what  the  sample  preparation  variance  would  be.  Again 
we  assume  a  one  per  cent  error  in  [Alt],  which  will  produce  an  error  of 
0.60  per  cent  in  [HJ  and  a  1.01  per  cent  error  in  [Alt  —  H].  On  conver- 
sion to  logarithms,  and  realizing  that  the  correlation  between  log[H]  and 
log [ Alt  — H]  for  replicate  samples  is  still  negative,  since  one  is  calculated 
from  the  other,  we  may  write  the  sampling  variance  as: 

a2  =  (-2)2  (0.003)2  +  (l)2  (0.004)2  +  (2)  (-2)  (—1.0)  (0.003)  (0.004) 
=  (0.010)2  [36] 

Thus  the  total  expected  variance  is  merely  the  sum  of  the  two,  or  (0.045 )2. 
We  may  also  obtain  this  directly,  by  using  the  sum  of  the  sampling 
and  observational  variance  expected  for  log[Alt  — H]  and  log[H].  We 
have  already  shown  that  the  expected  variance  for  log[Alt  — H]  is 
(0.006)2,  while  for  log[H]  it  is  (0.02)2  +  (0.003)2  or  (0.020)2.  Now  we 
see  the  utility  of  retaining  our  original  definition  of  the  SxP  8yi5  8zi  ... 
to  include  all  deviations  about  the  means.  This  allows  us  to  determine  the 
correlation  coefficient  between  [H]  and  [Alt  — H]  no  matter  what  the 
cause:  the  coefficient  is  always  negative  since  one  is  calculated  from  the 
other.  Thus  we  may  write: 

ffpK  =  (~2)2  (°-02)i;  +  (D'J  (0.006)2  +  (2)  (-2)  (1)  (-1.0)  (0.02)  (0.006) 
=  (0.046)2  [37] 

We  have  gone  through  this  example  in  considerable  detail,  since  some 
of  the  calculations  and  assumptions  are  not  at  all  obvious.  The  remainder 
of  the  predicted  values  for  o-pK  were  calculated  according  to  [35]  and 
[36]  and  compared  with  [37].  In  all  cases  the  agreement  was  excellent. 
In  the  present  case  we  require  the  variance  of  the  mean  pK  for  single 
observations  of  three  replicate  samples;  thus,  the  predicted  variance  for 
sample  No.  1  should  be  [(0.044)2  +  (0.010)2]/3  or  (0.026)2.  Since  we 


Experimental  Errors  In  Derived  Thermodynamic  Constants         15 

have  already  calculated  the  sampling  and  observational  terms  for  the 
variance  of  log[Alt  —  H],  we  may  use  their  sum  or  (0.006)2  in  [37  J 
and  divide  by  3;  the  result  is  (0.026)2.  Since  the  difference  between  these 
two  methods  is  quite  small  for  the  first  few  solutions  in  Table  1,  the 
expected  variance  for  the  mean  pK  was  calculated  in  both  ways.  Again 
the  agreement  was  good  and  the  estimates  for  the  variance  of  the  mean  pK 
are  shown  in  Table  1.  Clearly,  we  were  correct  in  anticipating  non-homo- 
geneity of  variance;  the  predicted  values  depend  on  the  magnitude  of  and 
the  correlations  between  the  variates  calculated  from  the  experimental 
observations,  as  well  as  on  the  errors  of  sampling  and  observation. 

Now  we  may  use  our  weighting  scheme,  equation  [13],  to  compare 
the  predicted  with  the  observed  variance.  We  estimate  chi-square  as 
follows: 

2       (n-  l)s2pK  _  y  T  (pK,  -pK)2] 

■&  i    L        <k        J  [38] 

The  result,  64.2,  should  be  distributed  as  chi-square  with  n  —  1  or  7 
degrees  of  freedom.  Since  the  probability  is  less  than  one  per  cent  that  this 
value  would  be  obtained  in  random  sampling,  we  reject  the  hypothesis 
that  pKi  is  constant  over  the  range  of  experimental  conditions.  Closer 
examination  of  the  data  in  Table  1,  however,  indicates  that  the  value  for 
pKx  for  the  most  dilute  solution  (No.  8)  is  suspiciously  low:  the  mean 
pK  for  these  8  observations  is  4.970  with  a  standard  deviation  calculated 
in  the  usual  fashion  of  0.135.  We  recall  that  in  this  particular  case,  three 
replicate  samples  were  prepared  for  each  pKx,  so  that  a  t-test  applied  to 
the  original  data  would  indicate  whether  this  mean  pK  should  be  rejected. 
An  analysis  of  the  original  unpublished  data  indicates  that  indeed  this  is 
the  case.  However,  we  wish  to  proceed  with  an  analysis  of  the  data  as 
presented,  which  usually  will  not  involve  sufficient  replications  for  the 
usual  statistical  tests.  Bliss  (3)  describes  a  simple  test  for  rejecting 
outliers;  the  ratio  of  the  range  to  the  standard  deviation  is  calculated  and 
then  compared  with  expected  values  (3)  for  sampling  from  a  normal  pop- 
ulation. In  this  case,  the  ratio  is  0.44/0.135  or  3.26.  The  probability  of 
obtaining  this  large  a  ratio  is  only  0.10  and  strengthens  our  suspicion  that 
this  sample  should  not  be  included;  indeed,  cogent  chemical  reasons  have 
been  advanced  (7)  for  rejecting  this  sample.  Chi-square  computed  accord- 
ing to  [38]  for  the  remaining  7  samples  is  13.9,  which  is  slightly  greater 
than  the  expected  value  for  p  =  0.05.  Thus,  we  tentatively  accept  the 
hypothesis  that  the  remaining  7  samples  with  mean  pK  =  5.02  and  standard 
deviation  0.041  are  drawn  from  the  same  population;  i.e.  pK;  is  a  constant. 

Since  our  predicted  variances,  a2  ,  seem  reasonably  homogeneous, 
we  inquire  whether  a  pooled  variance  would  be  acceptable  in  the  calcula- 
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tion  of  chi-square.  Bartlett's  test  or  a  maximum  variance  ratio  test  (3) 
does  not  provide  an  unequivocal  answer,  since  we  have  assumed  o-2K 
to  represent  the  population  variance  with  infinite  degrees  of  freedom. 
However,  the  maximum  variance  ratio  is  (0.031)2/(0.026)2  =  1.42,  which 
is  not  significant  (3)  unless  the  degrees  of  freedom  allowed  exceed  60  and 
approach  infinity.  For  practical  purposes,  it  is  useful  to  have  a  pooled 
variance,  and  we  shall  assume  for  this  reason  that  the  variance  is  in  fact 
homogeneous.  The  pooled  variance  for  the  7  observations  is  then 
1/n  2o-2K  =  (0.0276)2,  and  chi-square  computed  in  the  usual  manner 
is  13.3.  Thus,  our  weighting  scheme  has  not  altered  chi-square  materially. 
However,  had  the  predicted  variance  been  less  homogeneous,  as  for  all 
8  samples,  similar  calculations  yield  a  chi-square  of  117  compared  with 
64.2  from  equation   [38]. 

We  conclude,  therefore,  that  the  constant  in  the  mathematical  state- 
ment of  the  theory  is  in  fact  observed  to  be  constant.  Conversely,  if  the 
chemical  theory,  equation  [28],  is  accepted,  the  near  equality  of  the 
observed  variance  of  pKi  with  that  predicted  from  equation  [37]  estab- 
lishes the  validity  of  our  estimation  procedure. 

Observed  variance.  Further  verification  is  provided  by  an  analysis  of 
variance  of  the  original  data  whose  means  were  used  for  the  calculations 
in  Table  1.  Three  observations  were  made  on  each  of  3  solutions  (No.  6) 
diluted  to  1  X  10  4  M  in  [AltJ.  The  calculated  pKx  values  and  the  analysis 
of  variance  are  shown  in  Table  2.  Since  the  replicate  observations  1,  2 
and  3  on  sample  A  do  not  necessarily  correspond  to  the  same  observations 
on  sample  B  or  C,  only  a  one-way  classification  is  presented.  The  mean 
square  for  error  is  then  an  estimate  of  the  observational  variance,  a2  ,  for 
repeated  observations  on  the  same  sample.  From  Table  2,  this  is  (0.048  )2, 
which  is  remarkably  close  to  the  predicted  o-2  of  (0.049)2  from  [35]  for 
solution  No.  6  as  shown  in  Table  1 .  Similarly,  the  mean  square  for  samples 
is  an  estimate  (3)  of  o-2  -\-  3o-2  ;  solving  for  the  sampling  variance,  o-2  ,  we 
obtain  (0.012)2  which  compares  very  favorably  with  the  value  (0.01 1)2 
from  [36]  for  solution  No.  6  (Table  1). 

A  similar  analysis  of  variance  was  made  of  the  original  observations 
of  three  replicate  samples  for  each  pK;  whose  means  are  shown  in  Table  1. 
Solution  No.  8  was  omitted  as  already  discussed  and  solution  No.  3  was 
omitted  because  of  incomplete  replication.  The  results  are  shown  in 
Table  3,  where  treatments  indicate  different  solutions,  i.e.  different  con- 
centrations of  [Alt].  Differences  among  treatments  were  not  significant, 
substantiating  our  previous  conclusion  that  pKi  is  a  constant.  Differences 
among  sample  means  were  not  significant  either,  but  the  magnitude  of  the 
sample  mean  square  suggests  rather  large  sampling  errors.  In  this  case, 
the  sample  mean  square  is  an  estimate  of  o-2  -\-  a-2   -\-  6a2    and  the  error 
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mean  square  is  an  estimate  of  &*  -\-  c?  ,  where  a- \  is  an  interaction  term. 
Thus,  we  find  <j\  =  (0. 03 1)2.  which  is  considerably  larger  than  predicted 
or  observed  in  the  data  of  Table  2.  The  error  mean  square  corresponds 
to  (0.052)2;  therefore  our  best  estimate  of  the  population  variance  is 
(0.03 1)2  +  (0.05 2)2  or  (0.060)2.  This  may  be  compared  with  the  sum  of 
the  predicted  sampling  and  observational  errors  from  Table  1,  which  when 
pooled  are  (0.010)2  +  (0.047)2  or  (0.048)2.  A  chi-square  test  is  appro- 
priate for  comparing  our  prediction  with  the  observed  variance;  thus  chi- 
square  is  12(0.060)2/(0.048)2  =  18.8  which  is  not  significant  (p>0.05). 
Again,  however,  it  is  evident  that  our  prediction  is  somewhat  conservative. 
In  fact,  if  our  predicted  variance  (0.048)2  is  used  to  test  the  significance 
of  the  treatment  mean  square,  chi-square  is  5(0.078)2/(0.048)2  or  13.2, 
which  is  greater  than  the  value  of  11.1  for  p  —  0.05.  Thus,  we  might 
reject  the  hypothesis  that  pKx  is  constant,  when  in  fact  the  analysis  of 
variance  shows  it  to  be  constant  within  our  ability  to  dilute  and  measure 
pH  on  that  particular  occasion.  Of  course,  it  is  evident  that  a  chi-square 
test  is  considerably  more  conservative  than  an  F  test,  since  infinite  degrees 
of  freedom  are  assumed  for  the  variance  of  the  denominator  in  a  chi-square 
test. 

If  we  look  for  an  explanation  for  our  conservative  estimate  of  the 
sampling  variance  of  pKx,  it  is  possibly  our  estimate  of  the  errors  of 
dilution;  however,  it  is  doubtful  if  a  careful  analyst  would  make  an  error 
as  large  as  one  per  cent.  Another  possibility  is  some  systematic  error,  such 
as  failing  to  properly  calibrate  the  pH  meter  or  using  dirty  glassware. 
Again  this  implies  an  unusually  sloppy  analyst.  More  likely,  the  observed 
sampling  variability  is  due  to  some  failure  of  the  theory.  Ion  activity 
coefficients  might  have  been  calculated  improperly,  but  this  should  only 
affect  the  treatment  mean  square.  Frink  and  Peech  (7)  found  that  these 
solutions  were  supersaturated  with  respect  to  gibbsite,  and  that  Al(OH)3 
sometimes  precipitated.  This  precipitation  would  be  different  from  sample 
to  sample,  and  thus  we  suspect  a  chemical  rather  than  statistical  failure. 
However,  it  is  apparent  that  in  an  important  experiment,  it  may  be  worth- 
while for  the  investigator  to  obtain  accurate  estimates  of  his  precision, 
even  though  he  still  may  not  wish  to  replicate  the  entire  experiment.  In 
any  event,  we  conclude  that  our  estimation  procedure  produces  valid 
results  when  the  experimental  uncertainties  are  known. 

Finally,  we  should  point  out  that  the  statistical  test  previously  applied 
(7)  to  this  data  is  incorrect.  First,  an  error  was  made  in  the  coefficient  of 
the  variance  of  —2  log[HJ;  it  is  of  course  22(0.02)2  rather  than  2(0. 02)2  as 
stated.  Second,  the  correlation  between  the  variables  was  neglected. 
However,  the  present  examination  of  the  data  leads  to  the  same  conclusions 
as  Frink  and  Peech  (7). 
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Other  methods.  An  alternative  method  for  determining  whether  the 
experimental  data  fit  the  theoretical  model  is  suggested  by  Draper  and 
Smith  (5)  in  an  analysis  of  the  distinction  between  "lack  of  fit"  and  "pure 
error".  Again,  such  an  analysis  is  only  possible  when  replicate  observa- 
tions are  available.  Moreover,  since  it  involves  regression  analysis,  we 
must  specify  the  experimental  variable  which  might  be  expected  to  cause 
changes  in  the  observed  constant.  However,  it  is  instructive  to  apply  their 
methods  to  the  present  data. 

It  is  common  in  thermodynamics  to  assume  that  apparent  changes 
in  equilibrium  constants  in  aqueous  solutions  are  due  to  changes  in  ion 
activity  coefficients.  Specifically,  since  the  logarithm  of  the  activity  coeffi- 
cient is  a  function  of  the  square  root  of  the  ionic  strength,  it  is  customary 
to  plot  the  observed  constant  against  the  square  root  of  the  ionic  strength 
and  extrapolate  to  zero,  i.e.  to  an  ideal  solution  at  infinite  dilution.  Thus, 
we  shall  test  by  linear  regression  analysis  whether  pKx  is  a  function 
of  the  square  root  of  the  ionic  strength  of  the  various  solutions.  These 
solutions  correspond,  of  course,  to  treatments  in  the  previous  analysis  of 
variance. 

The  regression  analysis  follows  its  usual  form,  with  the  results  shown 
in  the  bottom  half  of  Table  3.  Having  demonstrated  that  the  linear  regres- 
sion is  not  significant,  we  can  use  the  methods  of  Draper  and  Smith  (5) 
to  partition  the  residual  mean  square  into  lack  of  fit  and  pure  error  terms. 
Now,  since  there  was  no  significant  lack  of  fit  to  the  linear  model,  we  have 
the  remaining  pure  error  mean  square  which  is  an  estimate  of  the  pop- 
ulation variance.  Comparing  this  term  with  the  pooled  error  term  from 
the  analysis  of  variance  in  the  top  half  of  Table  3,  we  find  they  are 
identical  and  correspond  to  a  population  variance  of  (0.06 1)2.  Moreover, 
the  treatment  mean  square  from  the  analysis  of  variance  corresponds  to 
the  pooled  regression  and  lack  of  fit  mean  squares  from  the  regression 
analysis.  Thus,  this  analysis  also  shows  that  pKj  is  not  dependent  on 
treatment  as  measured  by  ionic  strength  and  provides  us  with  an  estimate 
of  the  population  variance. 

Example  Two 

Continuing  with  other  examples,  we  evaluate  errors  inherent  in  the 
determination  of  solubility  products.  For  gibbsite,  the  crystalline  form  of 
Al(OH)3,  we  write  (6): 

pK  p  =  pAl  + 3  +  3pOH~  [  39 1 

Separating  activity  coefficients,  we  may  express  the  terms  in  [39]  as  func- 
tions of  the  measured  variates: 
pK     =  -log[Al  1  +  log[l  +  K  /H|  +  3  log[H]-3  logKw  +  log  f(y) 

8P  t  .        1  W 
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where  K  is  the  first  stage  hydrolysis  constant,  Kw  is  the  ionization  con- 
stant of  water,  and  f(y)  contains  the  activity  corrections.  Again,  we  assume 
that  these  constants  are  known  without  error,  and  that  the  variance  of 
— log[H]  is  (0.02)2.  Since  in  this  case  [Alt J  is  determined  analytically,  the 
errors  are  of  the  usual  sort:  we  estimate  the  variance  of  — log[AltJ  from 
six  analytical  determinations  as  (0.015)2.  The  variance  for  replicate  obser- 
vations of  [1  -f-  Ki/HJ  was  evaluated  for  each  value  of  [H]  by  first 
evaluating  the  variance  of  [Ki/H]  using  either  [27]  or  a  log  conversion 
to  permit  the  use  of  [6J.  Then,  the  variance  of  log[l  +  Ki/H]  was 
evaluated  in  a  manner  similar  to  that  previously  described  for 
log [Alt  —  H],  with  no  correlation  between  variates  (Table  4).  Since 
[1  -f-  Ki/H]  is  non-linear,  we  should  inquire  if  errors  are  introduced  by 
these  methods.  Again,  we  find  (Table  4)  that  for  small  deviations  about 
the  mean,  the  departure  from  linearity  is  not  serious. 

Finally,  we  require  the  correlation  coefficients  between  the  variates  in 
[40]  for  replicate  observations.  In  this  case,  only  [Hj  and  [1  -f-  Ki/H] 
are  correlated,  and  the  coefficient  is  obviously  close  to  —1.0  for  small 
deviations  about  the  means.  Thus,  the  variance  of  pK  expected  for  the 
first  sample  in  Table  4  is: 

a2    =  (-1)2  (0.015)2  +  (l)2  (0.001)2  +  (3)2  (0.02)2 

pK 

+  (2)  (1)  (3)  (-1.0)  (0.001)  (0.02)  =  (0.061)2  [41] 

If  we  inquire  what  the  error  in  analyzing  replicate  samples  would  be, 
we  find  we  have  no  basis  for  prediction  in  this  experiment.  Most  deter- 
minations of  solubility  products  are  based  on  the  preparation  of  solutions 
containing  the  necessary  constituent  ions  (in  this  case  A1  +  3  and  OH)  at 
concentrations  slightly  greater  and  less  than  the  equilibrium  concentrations. 
The  solutions  are  then  seeded  with  the  crystalline  solid,  and  equilibrium 
approached  from  the  resulting  supersaturated  and  undersaturated  solu- 
tions. In  this  sense,  preparation  of  replicate  samples  is  impossible:  the 
original  solutions  may  be  the  same,  but  each  approaches  equilibrium 
independently  and  the  final  observed  concentrations  reflect  not  only  the 
errors  in  preparation  of  the  solution,  but  also  inherent  differences  in  ap- 
parent equilibrium  concentrations.  It  appears,  then,  that  this  could  only  be 
measured  by  experiment.  In  any  event,  our  previous  analysis  shows  that 
errors  of  sample  preparation  are  much  smaller  than  errors  of  observation. 
Since  the  present  experiment  is  based  on  observations  of  single  samples, 
the  remainder  of  the  estimates  of  cr2K  were  calculated  from  [41]  and  are 
shown  in  Table  4.  Note  that  a2K  decreases  as  [Alt]  approaches  [H], 
reflecting  the  decreased  variance  as  the  correlation  term  becomes  more 
important. 

Chi-square  was  computed  as  before.  The  result   is   158;  obviously 
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these  samples  are  not  drawn  from  the  same  population,  and  we  reject  the 
hypothesis  that  pKsp  is  a  constant.  Chemical  evidence  (6)  suggests  that 
equilibrium  was  probably  not  established  in  these  experiments.  A  recent 
examination  by  Kittrick  (8)  of  solutions  equilibrated  for  four  years  was 
much  more  likely  to  fit  [39]:  from  five  determinations  he  obtained  a  mean 
pKs  of  34.03  with  standard  deviation  0.066,  compared  with  the  value 
33.57  with  s  =  0.282  from  Frink  and  Peech  (6).  Inasmuch  as  the  cal- 
culated variance  (Table  4)  is  reasonably  homogeneous,  we  will  use  a  pooled 
variance  of  (0.059)2  to  test  Kittrick's  data.  In  this  case,  chi-square  cal- 
culated in  the  usual  fashion  is  (4)  (0. 066)7(0. 059)2  or  5.00  which  is  not 
significant  (probability  >  0.20).  Clearly,  Kittrick's  data  indicate  that  pKsp 
for  gibbsite  is  constant  when  adequate  equilibration  occurs.  In  addition, 
we  have  increased  confidence  that  our  test  agrees  with  deductions  from 
chemical  evidence  regarding  the  constancy  of  equilibrium  constants. 

Example  Three 

Finally,  we  test  an  extreme  case  where  large  exponents  are  en- 
countered as  in  the  solubility  product  of  hydroxyapatite.  Following  Clark 
(4),  we  write: 

pKsp  =  1  OpCa  +  2  +  6pP04-3  +  2pOH-  [  42  ] 

Again,  we  wish  to  express  this  equation  in  terms  of  the  experimental 
variables.  The  concentration  of  [P04]  is  given  by  the  following  approxima- 
tion in  the  pH  range  5  to  9  studied  by  Clark: 

fPtl  K2K3 

[P04]  = 


[H2  +  K2H]  [43  j 

where  K2  and  K3  are  the  appropriate  ionization  constants  of  phosphoric 
acid.  Thus  we  may  express  [42]  as: 

pK  p  =  -  10  log[Ca]  -  6  log[Pt]  +  6  log[H"/3  +  K2H4/3] 
-6  log[K2K3]  -2  log  Kw  +  log  f(y)  [44] 

Here,  we  have  a  complex  function  in  [H]  to  evaluate,  which  can  most 
easily  be  done  by  our  now  familiar  scheme  of  finding  the  variance  of 
[H7/3  +  K2H4/S]  first  and  then  converting  to  logarithms,  realizing  of  course 
that  [H]7/3  and  K2[H]4/3  are  correlated  for  replicate  observations.  The 
calculations  are  tedious  and  will  not  be  presented  here;  the  variance  of 
log[H?/3  +  K0H4/3]  for  the  data  in  Clark's  Table  1  decreased  from 
(0.047)2  for  the  most  acid  solution  (No.  1)  to  (0.027)2  for  the  most  alkaline 
solution  (No.  27).  We  assume  the  variance  of  the  log[Ca]  and  log[Pt] 
determinations  to  be  (0.0 15)2,  the  same  precision  with  which  we  measured 
[Alt].  Again,  we  require  the  expected  variance  for  replicate  observations 
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on  a  single  sample,  and  in  this  case  the  variates  are  not  correlated.  The 
predicted  variance  of  pK     is  then  merely  the  sum,  or: 

a2    =  (-10)2  (0.015)2  +  (-6)2  (0.015)2  +  (6)2  (0.047)2  =  (0.332)2 

pK  [45 j 

The  remainder  of  the  values  for  o-2K  were  calculated  according  to  [45]. 
Chi-square  was  then  computed  and  found  to  be  143  which  is  highly 
significant  (p<0.01).  Since  Clark  obtained  a  mean  pKsp  of  115.40  with 
standard  deviation  0.707,  we  could  anticipate  this  result  since  the  predicted 
standard  deviation,  equation  [45 J,  is  much  smaller  and  decreases  from 
0.332  for  solution  No.  1  to  0.238  for  solution  No.  27.  Thus,  we  must 
conclude  that  these  observations  of  pKgp  were  not  drawn  from  the  same 
population,  or  in  other  words  pKap  is  not  constant  over  the  range  of 
experimental  conditions.  We  might  inquire  whether  we  have  estimated  the 
experimental  uncertainties  properly,  a  question  of  considerable  significance 
where  large  exponents  are  involved.  This  is  possible,  but  our  examina- 
tion of  Kittrick's  data  leads  us  to  believe  that  our  estimates  of  precision 
can  be  attained  in  practice. 


Conclusions 

The  applicability  of  a  thermodynamic  theory  is  tested  by  observing 
whether  the  constant  in  the  mathematical  expression  of  the  theory  is  in 
fact  constant.  Since  the  observations  from  which  the  constant  are  calculated 
are  necessarily  inaccurate,  a  chi-square  test  compares  the  ratio  of  the 
variance  of  the  constant  to  the  variance  estimated  from  the  con- 
stituent observations. 

Thermodynamic  equilibrium  constants  are  usually  expressed  as  linear 
functions  of  logarithmic  terms,  calculated  from  experimental  data  over  a 
wide  range  of  conditions.  However,  the  variance  of  the  derived  constant 
cannot  be  seen  intuitively,  particularly  when  large  exponential  terms  are 
included.  An  equation  was  derived,  therefore,  to  describe  the  variance  of 
a  sum  of  variates,  with  explicit  treatment  of  exponential  terms,  corrections 
for  correlations  among  the  experimental  uncertainties,  and  separation  of 
sample  preparation  and  observational  errors.  In  addition,  the  problems  of 
translation  of  linear  to  logarithmic  units  were  considered.  Comparison  with 
several  sets  of  experimental  data  indicates  that  the  derived  equation  cor- 
rectly predicts  the  observed  variance  of  the  constant  caused  by  experi- 
mental uncertainties. 
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Table  2.     Thermodynamic  hydrolysis  constant  of  aluminum  for  three  observations 
on  three  samples  and  an  analysis  of  variance 

Samples 


Observations 

A 

B 

C 

1 

5.04 

4.93 

5.04 

2 

5.04 

5.02 

4.98 

3 

4.98 

4.93 

4.93 

2 

15.06 

14.88 

14.95 

Source 


SS 


DF 


MS 


Total 

Samples 
Error 


0.019355 
0.005488 
0.013867 


0.002744 
0.002311 


1.19°" 


Table  3.     Analysis  of  variance  of  the  original  data  for  aluminum 
hydrolysis  from  Table  1 


Source 

SS 

DF 

MS 

F 

Total 

0  07476 

17 

Treatments 

0.03023 

5 

0.006046 

2.21ns 

Samples 

0.01714 

2 

0.008570 

3.13ns 

Error 

0.02739 

10 

0.002739 



Pooled  error 

0.04453 

12 

0.003711 



Source 


SS 


DF 


MS 


Total 

0.07476 

17 



Regression 

0.00639 

1 

0.006390 

Residual 

0.06837 

16 

0.004273 

1.49°" 


Lack  of  fit 
Pure  error 

0.02384 
0.04453 

4 
12 

0.005959 
0.003711 

1.61a8 

Reg  -+-  LoF 

0.03023 

5 

0.006046 
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Table  4.     Thermodynamic  solubility  product  for  gibbsite  derived  from 
measurements  of  pH  and   [Alt] 

Experimental—^  , Derived >  < Variance — -, 

log[Alt]     pH      -log[H]    log[l+-l]    logf(7)     pK  log[l+^ij  pK 

H  sp  H  8p 


2.00 

3.70 

3.63 

0.018 

0.41 

33.60 

(0.001)2 

(0.061)2 

2.02 

3.91 

3.84 

0.028 

0.41 

33.00 

(0.001  )2 

(0.061)2 

3.00 

3.96 

3.93 

0.034 

0.21 

33.51 

(0.002)2 

(0.060)2 

3.05 

4.01 

3.98 

0.038 

0.18 

33.39 

(0.002)2 

(0.060)2 

4.08 

4.21 

4.20 

0.061 

0.06 

33.66 

(0.003)2 

(0.059)2 

4.20 

4.23 

4.22 

0.063 

0.06 

33.72 

(0.003)2 

(0.059)2 

5.46 

4.58 

4.58 

0.134 

0.05 

33.96 

(0.005  )2 

(0.057)2 

5.59 

4.72 

4.72 

0.177 

0.02 

33.69 

(0.007  )2 

(0.055)2 
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